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Abstract 

An alternative method of finding a common metric for separate calibrations through the 
use of a common (anchor) set of items is presented. Based on Raju’s (1988) method of 
calculating the area between two item response functions, this (area-minimization) 
method minimizes the sum of the squared exact unsigned areas of each of the common 
items. This new method and five other currently available linking methods are illustrated 
with an empirical example. The need for additional research in this area, especially to 
establish the degree of congruence among various linking methods, is strongly 
recommended. 
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Developing a Common Metric in Item Response 
Theory: An Area-Minimization Approach 
It is well known in item response theory (IRT) that estimates of item parameters 
and abilities from two different calibrations are not directly comparable because they may 
not be on a common metric (Hambleton, Swaminathan, & Rogers, 1991; Lord, 1980). In 
general, two independent IRT calibrations result in two different metrics for item 
parameters and examinee abilities. In many applications of IRT, such as vertical and 
horizontal equating and differential item functioning (DIF), estimates of item parameters 
and abilities from two independent calibrations must be placed on a common metric prior 
to their use. The development of such a common metric involves transforming the metric 
from one calibration to the metric from the other calibration with two appropriately 
defined linking constants: A multiplicative constant (A) and an additive constant ( B ). 
For example, an IRT calibration of a test with the 2-parameter logistic (2-PL) model will 
result in an a -parameter and a b -parameter for each item in the test. Let the item 
parameter estimates from the first calibration be denoted as a n and b n for item i . 
Similarly, let the item parameter estimates for item i from the second calibration be 
denoted as a i2 and b i2 . The transformation that puts the metric from the second 
calibration on to the metric from the first calibration may be defined as (Hambleton et al., 
1991; Lord, 1980): 




( 1 ) 



and 



b i2 = Abj 2 + B . 



( 2 ) 
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The transformed item parameters ( a n and b' n ) from the second calibration are then said 
be on the same metric as the item parameters ( a n and b n ) from the first calibration. The 
transformation given in Equation (2) is also valid for transforming the ability ( 6 ) 
estimates from the second calibration to the metric associated with the first calibration. 

Current Methods for Developing a Common Metric 
There are currently several methods for deriving the two transformation/linking 
constants ( A and B ). Five of these methods are briefly described below. 

Mean-Mean Method 

The Mean-Mean method is a variant of the method developed by Loyd and 
Hoover (1980). It sets the means of the a - and b -parameters of the common items in the 
second test equal to those of the first test. That is, the transformation constants are 
defined as follows: 



and 



A.**- 



M. 



a 1 



( 3 ) 



B = M bx - AM b2 , (4) 

where M ai and M bi refer to the means of ^-parameters and ^-parameters, respectively, 

for test/calibration / or for the set of common items from test/calibration / . Additional 
information about this method may be found in Baker and Al-Karni (1991) and Kolen 
and Brennan (1995). 
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Mean-Sigma Method 

In the Mean-Sigma method (Marco, 1977), the means and standard deviations of 
the ^-parameters from the first and second administrations/calibrations determine the A 
and B constants: 



where 5D A(/) refers to the standard deviation of ^-parameters in test calibration i or in the 
set of common items from test/calibration i. 

X 2 -Method 

The x 2 -Method (Divgi, 1985) uses the standard errors of the item parameter 
estimates from both calibrations. Define E such that for item i , 



where, for the two-parameter model, I i aa , /, aA , and /, AA are defined by Lord (1980, p. 
191) as: 




( 5 ) 



and 



B — M AM b(2) , 



( 6 ) 




-i-i 



( 7 ) 




N 



( 8 ) 



N 




( 9 ) 



and 
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//*=(ft>/)T[- p *( i-n,)]- do). 

It can be shown that for the equated item parameters from the second calibration, 



I = 

i y aa 



and 



A 1 

r = / 

i y ab ± i y ab ? 

r — a 2 i 

1 ibb ~ n 1 ibb ■ 



(H) 

( 12 ) 

(13) 



To distinguish among the variance-covariance matrices of the parameter estimates 

/ 

for item i from the first and second calibrations, let and L ’ , denote, respectively, the 

item parameter variance-covariance matrices from the first calibration.and the second 
calibration after equating. Letting 



A = 



a n a i2 



(14) 



the quadratic form to be minimized is 

e = 4’(E„+Z:;)"A. (15) 

In this investigation, a variant of Equation (15) is used based on methods 
proposed by Oshima et al. (2000). Specifically, for an n-item test, the function to be 
minimized is 






. \ 2 



( 16 ) 



i=i 



Test Characteristic Curve (TCC~) Method 
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The Test Characteristic Curve (TCC) method (Stocking & Lord, 1983) uses item 
parameter estimates from both calibrations, as well as a spaced set of abilities, in 
minimizing the appropriate multivariate functions needed for estimating the linking 
constants (A and B ). Following Oshima et al. (2000), the function to be minimized in 
the TCC approach may be expressed as: 



where N stands for the number of spaced abilities used and n represents the number of 
items. The P u (9) and P 2 * (9 ) , for the 2-PL model, may be expressed as: 



where D and e are constants (Hambleton, Swaminathan, & Rogers, 1991). The general 
idea in the TCC approach is to find A and B which minimize Equation (17). 

Item Characteristic Curve Method 

Like the TCC method, the Item Characteristic Curve (ICC) Method (Haebara, 
1980) uses item parameter estimates from both calibrations, as well as a spaced set of 
abilities to minimize the multivariate function 



n 



n 




(17) 




(18) 



and 




(19) 



n 




( 20 ) 



Some of these approaches, originally developed for unidimensional, dichotomous 
IRT models, were later expanded to include multidimensional, dichotomous IRT models 
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(Davey, Oshima, & Lee, 1996; Oshima, Davey, & Lee, 2000) and unidimensional, 
graded response IRT models (Baker, 1992). Computer programs for estimating the 
needed transformation constants are also available from Baker (1992) and Lee and 
Oshima (1996). Additional information about the previously described linking methods 
may be found in Kolen and Brennan (1995). 

A careful examination of Equations (17) and (20) leads to two important 
observations: First, both equations depend on the specific number and type of thetas 
chosen for minimization. Stocking and Lord (1983) recommended 200 spaced thetas, and 
Baker and Al- Kami’s (1991) example used 21 spaced thetas. As Oshima, Davey, and Lee 
(2000) noted, there is a certain degree of arbitrariness in the number and type of thetas 
chosen. Also, Kolen and Brennan (1995) describe 5 different examinee or theta selection 
procedures for use in developing a common metric. Second, in the TCC approach, there 
is the potential for cancellation at the test level; that is, positive and negative differences 
in item probabilities, P u {0) - P%(0 ) , may cancel each other out. However, this is not a 

concern for the ICC approach because the item level probability differences are squared 
prior to summing them across thetas. 

The purpose of this investigation is to offer a new approach for developing a 
common IRT metric that avoids the problems of arbitrariness (due to the number and type 
of thetas used) and cancellation effects. This new approach is based on Raju’s (1988) 
exact unsigned area measure. Also included in this presentation is an empirical example 
to illustrate the new procedure as well as the five previously described linking 





procedures. 
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An Area-Minimization Approach 

According to Raju (1988), the exact unsigned area (EUA) between two ICCs for 
item i , based on the 2-PL model, may be expressed as 



EUA i =\H\, 



( 21 ) 



where 



#,= 



— r^lnj 1 + exp 
Da u a 2i 1 



Da u a 2 ,{b u -b' v ) 



a u ~ <■ hi 



(bu-bl) 



( 22 ) 



and, as before, subscripts 1 and 2 refer to the first and second calibrations, respectively. 
Substituting a, 2 and b i2 from Equations (1) and (2) into Equation (22), one obtains 



H. = 



J i + exp 

Da u a v I 






Aa u a 2i 



.-(b v -Ab v -B). (23) 



Note that Equation (23) is independent of theta. Therefore, the potential problems from 
the TCC and ICC methods, Equations (17) and (20) respectively, are not a concern for 
the area-minimization approach. 

Solution for A and B 

In view of Equation (21), the function to be minimized across n items in solving 
for A and B may be expressed as: 

f 2 (A,B) = \H,\ + \H 2 \+...+\H n \. ' (24) 

The absolute values in Equation (24) may not be easy to handle (mathematically), so an 
alternative but equivalent expression for minimization may be written as 

f 4 (A,B) = Ht + H 2 2 +... + H 2 n . 
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Since Equations (24) and (25) contain either absolute or squared quantities, there is no 
potential for cancellation effects in the area-minimization approach. The partial 
derivatives of f 4 (A,B) with respect to A and B can be written as: 



dUAE) 

dA 



= 2 H, 



dA 2 dA " dA 



and 



(26) 






dB 



dB 



dB 



dB 



(27) 



Using Equation (23), the partial derivatives of H i may be expressed, after simplification. 



as: 



and 



dH, 



dA Da 



2 i 



u.ri + exp(r)l-^P22 

L WJ 1 + exp(T) 



Q\i {p\i ^2)^2) 

Aa u —a 2i 



+ b~, 



(28) 



dH , _ -2exp(7) | t 
dB 1 + exp(f) 



(29) 



where 



j DC ‘« a ‘' (b u - Ab 2/ - B) . (30) 

A ci j/ a 2 j 

Given the mathematical complexity of Equations (28) and (29), it will not be easy 
to find the linking constants ( A and B ) that minimize this equation. However, the 
Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (Press, Flannery, Teukolsky, & 
Vetterling, 1986) can be used to solve for A and B , using Equations (28) through (30). 
The BFGS algorithm improves upon the Davidon-Fletcher-Powell algorithm, which is 
commonly used in psychometric practice (e.g., for Stocking and Lord equating). 
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The above procedure for estimating A and B within the two-parameter logistic 
model is equally valid for the three-parameter model, provided each item has the same c 
parameter in both calibrations. One way to obtain such a common c parameter is to set 
the estimates of c from one calibration to equal those from the other (Stocking & Lord, 
1983). According to Divgi (1985), the estimation of c parameters is not affected by the 
change of metric and, hence, may be ignored in developing a common metric. 

With respect to the Rasch model, where only one linking constant (. B ) is needed 
for metric transformation, the area minimization approach results in a solution for B, 
which is equal to the difference in the means of b parameters. That is, 

B = M n -AM b2 , (31) 

Additional details and the proof of this result for the Rasch model are given in Arenson 
and Raju (2002). 

An Illustration and Discussion 

To illustrate the area-minimization approach, as well as other approaches, for 
developing a common metric, data from a calibration administration (towards the end of 
the Spring 2001 semester) of two forms (Forms K and L) of a statewide high school 
algebra test were used. The test constitutes a partial requirement for graduation from high 
school. Both forms, containing 55 items each, were calibrated concurrently. The 
distributions of raw scores were similar for both forms, as shown in Table 1 . Raw scores 
were converted to a scale with a mean of 500 and a standard deviation of 50. Although 
the distributions of raw scores were similar, the tests contained different items, except for 
the 1 3 anchor items. Item parameters for these 1 3 items were used to transform the metric 
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underlying Form L onto the metric underlying Form K. Six different transformations 
were obtained, one for each of the six linking methods described above. 

The two-parameter logistic model (Lord, 1 980) was used to estimate the item 
parameters, separately, for Forms K and L. The two forms were calibrated with 
PARDUX (Burket, 1991). Once the item parameters for the 13 anchor items were known, 
separate linking constants (A and B) for each of the six methods were obtained and are 
reported in Table 2. The linking constants for the Mean-Mean and Mean-Sigma methods 
were computed with a program written in SAS. The linking constants for Haebera’s ICC 
method and Divgi’s chi-square were obtained with the IPLINK program (Lee & Oshima, 
1996). It should be noted that linking constants for Divgi’s method were based on a 
simplified version of Equation (15) (Oshima et al., 2000). The linking constants for 
Stocking and Lord’s TCC method were obtained with the PARDUX program. Finally, 
the linking constants associated with the area-minimization approach were obtained with 
a computer program specially written for this research. Means and standard deviations of 
the a- and 6-parameters for the anchor items in Form K are shown in Table 3. Also 
shown in this table are the means and standard deviations of the one unequated and 6 
equated item parameters for Form L. Table 4 displays the o-parameter estimates for the 
common items from Form K, as well as the unequated parameter estimates from form L. 
In addition, Table 4 shows the equated estimates for each of the methods described. Table 
5 displays similar information for the 6-parameters. 

The equating constants (see Table 2) appear to be quite similar across the 6 
linking procedures. The similarities between the Form K and the equated Form L 
parameter estimates are best captured in Figures 1 and 2. The o-parameters are close to 1 
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and the 6-parameters are close to 0 across the five procedures. This result is probably not 
too surprising in view of the fact that the two forms were quite comparable in terms of 
their raw score means and standard deviations. This is not to say that the estimated 
linking constants will be this similar for other tests and/or forms or for the same test with 
different sub-populations as in the case of DIF research. The current example is designed 
simply to illustrate the area- minimization approach, while presenting comparable data 
from the other available linking procedures. This example is not intended to offer an 
evaluation of the various procedures. There is certainly a need for a comprehensive 
assessment of the various linking procedures, hopefully with recommendations for 
practitioners. Oshima et al. (2000) recently reported a Monte Carlo assessment of 
different linking procedures in the multi-dimensional IRT context. A similar study in the 
unidimensional IRT framework is highly desirable. Kolen and Brennan (1995) also 
recommend the need for assessing the comparability and accuracy of the known linking 
procedures. As Oshima et al. and Divgi (1985) noted, a major problem that one is likely 
to encounter in such an investigation is the question of what criterion to use for 
evaluating the results from different linking procedures. If the area measure is used as the 
criterion for assessing the accuracy of various linking procedures, the area-minimization 
method is likely to perform better than the other method. If the difference between two 
TCCs is used as the criterion, the Stocking and Lord’s method may do better than the 
other methods. So, there is a definite need for defining an appropriate (or impartial) 
criterion for assessing the accuracy of various linking procedures. Finally, there is a need 
for extending the area-minimization approach to the polytomous case. 
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Footnote 

'The Loyd and Hoover (1980) method was designed for the Rasch model. They defined 
the multiplicative constant as A = ^ , the ratio of the discriminant constant of the second 
calibration to that of the first calibration. 



17 



Common Metric in IRT 17 



Table 1 



Descriptive statistics for Forms K and L 





Raw Score 


Scale Score 




Form 


N 


Mean 


SD 


Mean 


SD 


Reliability 


K 


6994 


32.03 


10.707 


554.5 


35.65 


0.916 


L 


6941 


32.97 


10.388 


551.0 


36.90 


0.916 




? t 



18 



Common Metric in IRT 



18 



Table 2 

Equating Linking Constants 





Mean- 

Mean 


Mean- 

Sigma 


ICC 

(Haebera) 


(Divgi) 


Stocking & 
Lord 


Area- 

Min. 


A 


0.9758 


0.9975 


0.9792 


0.9752 


0.9761 


0.9852 


B 


0.1180 


0.1019 


0. 1 146 


0.1035 


0.1020 


0.1123 
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Table 3 

Summary Statistics of Unequated and Equated Item Parameters 





a 


b 


Mean 


SD 


Mean 


SD 


Form K 


1.205 


0.319 


0.842 


0.771 


Form L 










Un equated 


1.176 


0.298 


0.742 


0.773 


Mean -Mean 


1.235 


0.328 


0.941 


0.752 


Mean-Sigma 


1.179 


0.299 


0.842 


0.771 


ICC (Haebera) 


1.201 


0.304 


0.841 


0.757 


X 1 (Divgi) 


1.266 


0.336 


1.021 


0.733 


Stocking & Lord 


1.203 


0.305 


0.826 


0.753 


Area-Min im izati on 


1.194 


0.302 


0.843 


0.759 
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Table 4 

g-Parameter Estimates for Common Items in Forms K and L 





Form L 


Item No. 


Form K 


Unequated 


Mean- 

Mean 


Mean- 

Sigma 


ICC 

(Haebera) 


X 2 (Divgi) 


Stocking & 
Lord 


Area- 

Min. 


7 


0.97 


0.93 


0.99 


0.93 


0.95 


1.02 


0.95 


0.95 


8 


1.72 


1.53 


1.76 


1.53 


1.56 


1.80 


1.57 


1.55 


9 


1.38 


1.31 


1.41 


1.31 


1.34 


1.45 


1.34 


1.33 


10 


1.50 


1.47 


1.54 


1.47 


1.50 


1.58 


1.50 


1.49 


29 


1.57 


1.62 


1.61 


1.62 


1.65 


1.65 


1.66 


1.65 


30 


1.54 


1.43 


1.58 


1.43 


1.46 


1.62 


1.46 


1.45 


31 


0.88 


0.95 


0.90 


0.95 


0.97 


0.92 


0.97 


0.96 


32 


1.19 


1.14 


1.22 


1.14 


1.16 


1.25 


1.17 


1.16 


33 


0.94 


0.85 


0.96 


0.85 


0.87 


0.98 


0.87 


0.87 


43 


1.33 


1.41 


1.36 


1.41 


1.44 


1.39 


1.44 


1.43 


44 


0.74 


0.81 


0.76 


0.81 


0.83 


0.78 


0.83 


0.82 


45 


1.02 


1.05 


1.05 


1.05 


1.07 


1.08 


1.07 


1.06 


46 


0.89 


0.79 


0.91 


0.79 


0.81 


0.93 


0.81 


0.80 



erJc 
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Table 5 



^-parameter estimates for common items in Forms K and L 





Form L 


Item No. 


Form K 


Unequated 


Meari- 

Mean 


Mean- 

Sigma 


ICC 

(Haebera) 


X 1 (Divgi) 


Stocking & 
Lord 


Area- 

Min. 


7 


-0.43 


-0.60 


-0.30 


-0.50 


-0.47 


-0.19 


-0.49 


-0.48 


8 


-0.05 


-0.22 


0.07 


-0.12 


-0.10 


0.17 


-0.11 


-0.10 


9 


0.68 


0.62 


0.78 


0.72 


0.72 


0.86 


0.71 


0.72 


10 


0.09 


-0.04 


0.21 


0.06 


0.08 


0.31 


0.07 


0.08 


29 


0.70 


0.56 


0.80 


0.66 


0.66 


0.88 


0.65 


0.67 


30 


0.37 


0.27 


0.48 


0.37 


0.38 


0.57 


0.37 


0.38 


31 


0.87 


0.82 


0.97 


0.92 


0.92 


1.05 


0.90 


0.92 


32 


1.49 


1.40 


1.57 


1.50 


1.49 


1.63 


T .47 


1.49 


33 


1.59 


1.62 


1.67 


1.72 


1.70 


1.73 


1.68 


1.70 


43 


0.86 


0.79 


0.96 


0.89 


0.89 


1.04 


0.87 


0.89 


44 


2.46 


2.15 


2.52 


2.25 


2.22 


2.56 


2.20 


2.23 


45 


0.96 


0.99 


1.05 


1.09 


1.08 


1.13 


1.06 


1.08 


46 


1.36 


1.29 


1.45 


1.39 


1.38 


1.52 


1.36 


1.38 
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